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Abstract. The use of the proportional odds (PO) model for ordinal regression is 
ubiquitous in the literature. If the assumption of parallel lines does not hold for 
the data, then an alternative is to specify a non-proportional odds (NPO) model, 
where the regression parameters are allowed to vary depending on the level of the 
response. However, it is often difficult to fit these models, and challenges regarding 
model choice and fitting are further compounded if there are a large number of 
explanatory variables. We make two contributions towards tackling these issues: 
firstly, we develop a Bayesian method for fitting these models, that ensures the 
stochastic ordering conditions hold for an arbitrary finite range of the explanatory 
variables, allowing NPO models to be fitted to any observed data set. Secondly, 
we use reversible-jump Markov chain Monte Carlo to allow the model to choose 
between PO and NPO structures for each explanatory variable, and show how 
variable selection can be incorporated. These methods can be adapted for any 
monotonic increasing link functions. We illustrate the utility of these approaches 
on novel data from a longitudinal study of individual-level risk factors affecting 
body condition score in a dog population in Zenzele, South Africa. 

Keywords: Bayesian inference, ordinal regression, Markov chain Monte Carlo, 
reversible-jump, Bayesian model choice. 


1 Introduction 

The most common regression models for analysing ordinal data fall under the set of cu¬ 
mulative link models, in which the categories of the response variable can be modelled as 
contiguous intervals on some continuous scale (McCullagh, 1980). A general monotonic 
increasing link function is then used to map these intervals from the continuous scale 
onto the interval (0,1). The choice of link function will generally lead to qualitatively 
similar model fits, and so can be chosen on the basis of interpretability (McCullagh, 
1980) or convenient mathematical properties (e.g. Albert and Chib, 1993). To this end 
we concentrate on the logistic link, leading to comparisons of the cumulative odds. 

A popular implementation assumes that the relationship between the cumulative 
log-odds and the explanatory variables does not depend on the response category (the 
proportional odds [PO] model—McCullagh, 1980). Under simple constraints, this imple¬ 
mentation guarantees that the model exhibits stochastic ordering (i.e. it ensures that 
for J ordered groups, the cumulative probability of belonging to group j is less than or 
equal to the cumulative probability of belonging to group j +1, for j = 1,..., J — 1). As 
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a direct result of its simplicity and ease-of-interpretation, the PO model is commonly 
used in the literature. 

An alternative is to allow the relationship between the response and explanatory 
variables to vary by response category (the non-proportional odds [NPO] model). Whilst 
more flexible than the PO model, the use of NPO models in the literature is limited, 
since the stochastic ordering conditions will only hold for a limited range of values 
of the explanatory variables (Agresti, 2010). This means that many traditional fitting 
mechanisms can fail to fit (Tutz and Scholz, 2003), and in the case where the PO 
assumption fails to hold, it may be difficult to find estimates for the general NPO model. 
A useful alternative is to fit a set of J — 1 separate binary logistic regression models to 
each cumulative logit separately (Bender and Grouven, 1998; Cole et ah, 2004), however, 
the parameter estimates for the regression parameters do not always correspond to those 
obtained from the general model (Tutz and Scholz, 2003). The partial proportional 
odds (PPO) model (Peterson and Harrell, 1990) allows for a mixture of PO and NPO 
variables to be included, though the structure for each variable must be specified in 
advance. Various approaches have been developed as a means of assessing whether the 
PO assumption is appropriate for a set of given variables (see e.g. Brant, 1990; Agresti, 
2010), but these are difficult to apply to large numbers of variables, particularly if there 
are interactions between some of them that may impact the relationship. 

Another alternative approach—when the response variable is such that to belong to 
a particular category it is necessary to pass through all previous categories in turn—is 
to use continuation ratios (Feinberg, 1980). Good reviews of general ordinal regression 
frameworks can be found in Ananth and Kleinbaum (1997) and Tall et al. (2002). 

The motivating example for this paper is a longitudinal, individual animal-level 
study of risk factors associated with body condition score in a population of dogs. 
These data form part of a wider study to examine the impact of immunological and 
demographic factors on canine rabies vaccination coverage, which covered four loca¬ 
tions: Braamhscherville and Zenzele in Gauteng province. South Africa; and Antiga 
and Kelusa in Bali province, Indonesia. Full details of the wider study, and a compre¬ 
hensive analysis of all the data collected from each of the sites is provided in Morters 
et al. (2014). 

To illustrate the requirement and performance of the methodology, we focus at¬ 
tention on one particular data set from Zenzele. Full details of these data are given in 
Section 5. The response variable is body condition score (BGS)—defined on a scale from 
1-9 where a score of 1 is highly underweight, 5 is healthy, and 9 is highly overweight. 
As such it seems sensible to consider using cumulative link models. 

There are various challenges with modelling this system, and we expand on each 
point in the subsequent discussion: 

1. We wish to perform variable selection, in order to assess the relative impact and 
importance of a series of potential risk factors on BGS. 

2. The data are longitudinal, and so it is necessary to account for clustering due to 
repeated measurements on individual animals. 
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3. We wish to assess the weight-of-evidence for PO versus NPO structures for each of 
the variables. This information is useful in helping to build up a picture (along with 
other indirect sources of evidence—see Morters et ah, 2014) of the environmental 
processes driving canine demography in these regions (see Section 5 for a more 
detailed discussion of this point). 

4. In order to tackle point 3, it is necessary to overcome some of the challenges 
regarding the htting of NPO models when the stochastic ordering conditions may 
not hold. 

In a classical statistical framework, model choice is usually performed using some form 
of model comparison criteria, such as Akaike’s Information Criterion (AIC; Akaike, 
1974), or likelihood ratio testing. These procedures use information from a single point 
estimate of the parameters, 0, and neglect the uncertainty in 6. In addition, inference 
is made conditionally on the selected model, and does not incorporate uncertainty in 
the choice of model. This can be important in cases where explanatory variables show 
consistent evidence of an effect across a range of models, but is not selected in the ‘hnal’ 
model (see e.g. Viallefont et ah, 2001). Here we propose to use a Bayesian framework, 
and implement model selection using posterior probabilities of association (see e.g. Kass 
and Raftery, 1995; Viallefont et ah, 2001; O’Hara and Sillanpaa, 2009). This has the 
advantage that it allows us to assess the weight-of-evidence in favour of a given model, 
and also allows us to assess the evidence for a particular variable being associated with 
the response after averaging across all possible models. This is particularly important 
to this study, since we also wish to assess the conditional evidence of a PO or NPO 
structure for the relationship between an explanatory variable and the response, given 
that an association exists. 

A common method to account for clustering due to repeated measurements is to use 
mixed effects models (see e.g. Higgle et ah, 2002), in which the error term is split into 
different components in order to model the variance-covariance structure at different 
hierarchies (Hedeker and Gibbons, 1994; Gibbons and Hedeker, 1997; Hartzel et ah, 
2001; Hedeker, 2003; Liu and Hedeker, 2006). These techniques are well characterised 
in the literature, though there is some debate about how to perform model selection 
in the presence of random effects in a classical setting (see e.g. Vaida and Blanchard, 
2005; Liang et ah, 2008). In the Bayesian setting, all parameters are considered random 
variables, and it is straightforward to incorporate a priori clustering into the prior. The 
model choice problem then remains the same, as the parameters are simply integrated 
over when estimating the posterior probabilities of association. 

The literature surrounding the development of ordinal regression frameworks is large 
and varied, applied in a wide range of fields. Focussing on Bayesian models; probit link 
functions are frequently used (e.g. Albert and Chib, 1993; Chu and Ghahramani, 2005; 
Yi et ah, 2007) and there are various recent developments in modelling the link function 
using mixture distributions (e.g. Lang, 1999; Leon-Novelo et ah, 2010). Different fitting 
mechanisms have also been developed, including Markov chain Monte Carlo (MCMC— 
Lang, 1999; Ishwaran and Gatsonis, 2000; Holmes and Held, 2006; Yi et ah, 2007; Webb 
and Forster, 2008; Leon-Novelo et ah, 2010), Laplace approximations (e.g. Chu and 
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Ghahramani, 2005; Paquet et al., 2005) and expectation-propagation algorithms (Chu 
and Ghahramani, 2005). Holmes and Held (2006) develop efficient MCMC samplers for 
logistic multinomial regression models, and O’Brien and Dunson (2004) develop a multi¬ 
variate logistic regression framework that provides a marginal logistic structure for each 
of the outcomes. Some work has also been done on model selection using probit models 
(e.g. Albert and Chib, 1997; Chu and Ghahramani, 2005; Webb and Forster, 2008), 
and Mwalili et al. (2005) extend a PO model to account for interobserver measurement 
error. This list is by no means exhaustive, but as far as we are aware no method has 
been developed that accounts for all four of the challenges we highlighted earlier within 
the same framework. This manuscript is an attempt to provide an alternative approach 
for fitting logistic regression models, which allows both PO and NPO structures to be 
used (and provides a model-driven means of assessing which structure is most relevant 
for each variable in the presence of other variables), and which can be extended to deal 
with repeated or clustered measurements, as well as variable selection. 

The first challenge we address is to provide a framework in which the stochastic 
ordering conditions can be made to hold for any given data set. This provides a means 
to explore the fitting of NPO models to any data set, and facilitates the development of a 
more general approach in which the relationship between the response and explanatory 
variables (e.g. PO or NPO) can be allowed to vary according to the data. The latter is 
our second contribution, and is useful because often we do not know which explanatory 
variables are best modelled using PO or NPO structures in advance, particularly when 
there are a large number of variables. To this end, Tutz and Scholz (2003) propose a 
method that switches between the PO, PPO and NPO models, fitting the model via a 
penalised likelihood approach. However, in this case we also would like to produce an 
estimate of the support under the data for these competing structures for each of the 
variables, which can provide some indirect evidence regarding the mechanisms at play 
in the underlying system. Although the PO model could be viewed as a special case of 
the NPO model, the naive use of a straight NPO model could result in overfitting. 

The challenge with comparing PO and NPO structures is that the dimensionality 
of the system is different in each case (a single regression parameter for the PO model 
corresponds to J — 1 parameters for the NPO model). To deal with this issue we use 
reversible-jump Markov chain Monte Carlo (RJ-MCMC—Green, 1995) to fit the model, 
and Bayesian model averaging (BMA—e.g. Kass and Raftery, 1995) to produce poste¬ 
rior probabilities of association (PPAs) for the support under the prior and the data 
for the choice of PO or NPO structure, averaged across the set of possible models. 
Finally we extend these ideas to incorporate variable selection (see e.g. Dellaportas 
et ah, 2002; O’Hara and Sillanpaa, 2009). Note that an implementation of model choice 
based on Bayes Factors for ordinal regression models was developed in Albert and Chib 
(1997), though each competing model needs to be fitted separately in order to be com¬ 
pared (Chib, 1995). Here we integrate across the competing models in one framework, 
which is likely to be much more efficient when searching across a large model space. An 
alternative Bayesian RJ-MCMC approach to model choice for ordinal probit models is 
developed in Webb and Forster (2008). 

In Section 2 we discuss the Bayesian paradigm and (RJ-)MCMC. In Section 3 we 
introduce the general cumulative link model, and more specihcally the PO, NPO and 
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PPO models. In Section 3.2 we discuss how the Bayesian framework can be used to 
ensure stochastic ordering for specified variable ranges, and we justify this approach in 
practice in Section 3.4. The specific RJ-MCMC sampler for this ordinal model choice 
problem is described in Section 4. We then apply these methods to both simulated data, 
as well as data from a longitudinal study of individual-level risk factors affecting body 
condition score in a dog population in Zenzele, South Africa (Section 5). We conclude 
with a discussion (Section 6). 


2 Bayesian inference and Markov chain Monte Carlo 

All of the models described in this paper will be formulated in a Bayesian framework, 
and htted using Markov chain Monte Carlo (MCMC). We assume that readers are fa¬ 
miliar with the Bayesian framework, but otherwise they are referred to various excellent 
texts available, such as Gilks et al. (1996); Gelman et al. (2004) and Gamerman and 
Lopes (2006). The model fitting algorithms described in this manuscript are specifi¬ 
cally variations of the Metropolis-Hastings (M-H) algorithm (Metropolis et al., 1953; 
Hastings, 1970). 

Reversible-jump MCMC (Green, 1995) is an extension to the classic M-H routine 
that allows the Markov chain to jump between models with different dimensionality. 
Again, we do not discuss the full details of RJ-MCMC here, but for good introductions 
to the method the reader is referred to papers by Waagepetersen and Sorensen (2001) 
and Hastie and Green (2012). 


2.1 Bayesian model choice using reversible-jump MCMC 


Assume that we have V competing models to choose between. We can formulate the 
Bayesian model choice problem as one of estimating the posterior probability that a 
model (My) is true, given the choice of competing models (Mi,..., My). Formally, this 
quantity is defined as 


P{My I D) = 


f{D I My)P{My) 

El=lf{D\My)P{Myy 


( 1 ) 


with P {My) the prior probability of association for model M„, and 


/ {D I My) = [ fiD\ UJy,My) f (cJ y \ My) dU) y , 


( 2 ) 


the integrated likelihood', where D is the data, and is the (multidimensional) param¬ 
eter space for the unknown parameters uSy corresponding to model M„. The quantity 
(1) is sometimes referred to as the posterior probability of association (PPA). These 
ideas for Bayesian model choice go back originally to Jeffreys (1935, 1961), and for a 
detailed introduction see Kass and Raftery (1995) and O’Hara and Sillanpaa (2009). 

To implement model choice in an RJ-MCMC framework, let n = i be the model 
indicator at a given iteration (i.e. the chain is in model Mi), then let p{Mi —>■ Mj) 
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denote the probability that a jump from Mi to Mj is proposed. In order to jump 
between models of differing dimensionality, the parameters are mapped to a set of 
parameters via the inclusion of a set of dummy parameters, Ui and Uj, that are 
chosen to ensure that dmi{u}i^Ui) = dim (wj, Mj). Once these dummy parameters are 
chosen, (jjJi,Ui) is mapped to (u}j,Uj) through a deterministic bijective function gtj, 
such that Qij {uJi,Ui) = (u>j,Mj), and the reverse move is gji {u}j,Uj) = {u}i,Ui). The 
acceptance probability of the move is then given by: 


a = min 


f{D\u;,,M,) /KIM,) P(M,-) 
’/(r>K„M0 /K|M,) P(M,) 

p{Mj Mi)qu ( m ,-) d{u}yUj) ' 

^ p{Mi ^ Mj)qu{ui) ^ d{uJi,u^) 


( 3 ) 


where Qu (ui) is the proposal density for the dummy parameters Ui, and likewise for 
Qu (uj). The hnal quantity in (3) is the absolute value of the determinant of the Jacobian 
matrix. 


One advantage of using RJ methodology is that for a well-mixing model, the PPA 
dehned in (1) for a model M„ can be simply estimated as the proportion of time that 
the chain spends in model v. In Section 4 we show how this routine can be imple¬ 
mented for variable selection, as well as choosing between PO and NPO structures for 
individual variables. For other applications of RJ-MCMC in Bayesian model choice see 
e.g. Richardson and Green (1997) and Dellaportas et al. (2002). 


3 Cumulative link models 

Let Y = (Yi,..., Yj) be a set of counts of individuals in j = 1,..., J ordered categories, 
which can be modelled as 

Y ^ Mult (n,p), (4) 

where n is the number of individuals, and p = (pi,... ,pj) correspond to the probabil¬ 
ities of each individual being in any given category j (such that = !)■ 

have a set of K explanatory variables, Xi = {Xu,... ,XiK), associated with subset i 
of the n individuals (where i = 1,..., /, such that n = ni + ■ ■ ■ + nj), then 

Fi ~ Mult (ni,Pi), (5) 

where Pi = {pn,... ,pij) and 'Yli'j=iPi 3 — 1- ^ fully individual-based model then 

I = n and rij = 1 for all i. Letting Ci correspond to the category that an individual i 
belongs to (such that Ci takes values 1,..., J), then following McCullagh (1980), we can 
model the cumulative probabilities, P {Ci < j) = 'fij through a monotonic increasing 
link function h(-), mapping the interval (0,1) —>■ (—oo, oo), as 

^ ~ ~ (®) 

where Pi = /3o + PiXn -\ - hPxXiK is a linear regression term, and /3 = (/3o, ■. •, Pk) is 

a vector of iL -I-1 regression parameters. In this framework the 0, parameters correspond 
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to a set of latent continuous ‘cut-points’, such that —oo < 0i < • • • < 6j-i < oo. For 
identifiability we set /3o = 0. The probabilities of category membership are then given 
by 

( for j = 1, 

Pd = { 7y-7i0-i) for j = 2,J-1, (7) 

[ 1 - 7i(J-i) for j = J. 

In (6), the effect of the explanatory variables is independent of the grouping, and so 
regardless of the choice of link function the models display strict stochastic ordering (Mc- 
Cullagh, 1980). This means that subject to the constraint —oo < < • • • < 0j_i < oo, 

the cumulative probabilities will be such that 0 < 7^1 < 7 i 2 < • ■ • < 7i(j-i) < 1- 

A more general model would allow the effect of the covariates to vary between the 
groups, such that 

= (8) 

In this case the models are only stochastically ordered for certain ranges of explanatory 
variables (see e.g. Agresti, 2010; Congdon, 2005; Tutz and Scholz, 2003). We discuss 
both forms of these models in detail for the case of the logistic-link function, but also 
extend the discussion to more general cases. 

3.1 Proportional odds model 

A common form for the link function is the logistic link: 

h ( 7 y) = log ~ 

This is known as the proportional odds (PO) model (McCullagh, 1980), so-called because 
the cumulative log-odds ratio for two sets of explanatory variables, Xi and X 2 is given 
by 


logit ( 71 ,) -logit ( 72 ,) = 0j-(3^X,-0^+f3^X2 

= f3^{X2-X,). ( 10 ) 

Hence the cumulative log-odds ratio is proportional to the distance between Xi and 
X 2 (see also Agresti, 2010). 

In the case of the PO model (9), the 9 and /3 parameters are a priori independent, 
and so the joint prior distribution can be written as / (/3, 9) = f {9) / (/3), where we let 

K J-l 

/ (/3)=n / fw=f (^ 1 ) n / I ^^- 1 ) ’ ( 11 ) 

fe=l j=2 

where / {9i) is defined in (— 00 , 00 ), and / {0j \ 0j-i) is defined in the range (0j_i, 00 ) for 
j = 2,..., J — 1 (see also Albert and Chib, 1993; Johnson and Albert, 1999; Congdon, 
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2005). This ensures stochastic ordering for any values of /3. We let 6i ^ N (0,ag^, 
Pk ^ N ^0, a'p'^ (for k = 1,..., /-f) and 

Og I - N (0, al) I (r,_i, oo) for i = 2,..., J - 1, (12) 

where / (rj_i,oo) signifies that the distribution is truncated in the region (Tj_i,oo) 
(i.e. it is a lower-truncated normal distribution) with Tj-i = Oj-i- Other alternative 
choices for the prior distributions include doubly-truncated normals (Congdon, 2005), 
an ordered uniform distribution (Ishwaran, 2000), or a re-parameterisation which maps 
the constrained variables 0 to a set of unconstrained variables, a, which can be given, 
for example, a multivariate normal prior (Fahrmeier and Tutz, 1994; Albert and Chib, 
1997). We choose normal random walk proposal distributions for each fik: such that 


P'k\Pt^ (13) 

where CTp^ is the proposal variance. For the cut-point parameters, 0^, we choose trun¬ 
cated uniform random-walk proposals, such that 


or I = l [/ (r 


[-( 


/l(^) * ^(0 I Z )(0 

6) - T0,uim 9^j ' TO, 


ef-re,efl. 


, mm 


- Te] 


if j = 1, 

if j = 2,...,J-2, 
ifj = J-1, 


(14) 

where rg > 0 controls the size of the maximum unconstrained move away from the 
current value at each iteration. 


3.2 Non-proportional odds model 

The non-proportional odds (NPO) model is specified as 

( x - !y.. ) = 

In this case the regression parameters are allowed to vary with category level, such that 
/iy = fTj Xi (see e.g. Agresti, 2010; Bender and Grouven, 1998; Tutz and Scholz, 2003; 
Congdon, 2005). The key challenge is that in order for stochastic ordering to hold, it is 
necessary that 


-oo < 6»i - I3^X < 02 - <■■■< 0j_i - (3j_^X < oo (16) 

for all X. For identifiability we set each of the intercept parameters /3oj = 0. If we 
have K explanatory variables, then after expanding out the regression in the stochastic 
ordering constraints (16), for any j = 1,..., J — 2, we have that 

K 

k=l 


( 17 ) 
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which must hold for any value of Xk- 

In the first instance, assume that we have a lower and upper bound for the possible 
values of X^, such as would be the case if Xk were categorical. Denote the minimum and 
maximum values of Xk as X™ and X^ respectively. The condition (17) then becomes 

K 

Bj - Bj+i < ^ min ( [Pkj - PkU+i)] [^kj - /3fc(j+i)] X^) ■ (18) 

fc=i 

For brevity, let 

K 

Ckj = min \Xk {Pkj — I3k{j+i)) yXk {(3kj — Pk{j+i))] a-nd Cj = Ckj- (19) 

fc=i 


In a similar manner to the PO model, we can therefore specify the joint prior dis¬ 
tribution of (3 and 0; however this time we do not assume both sets of parameters are 
independent, hence 


/(/3,0) 




fidi) 


, 7-1 


i-2 


K 




/c=l 


( 20 ) 


where / (/3fc) {k = 1,... ,K) and / (0i) are defined as before, and 

/ (0j I 6j-i, f3j, is the probability density function for a truncated normal distri¬ 

bution, N (0,CTg) I (Tj-i , oo) with Tj-i = 9j-i — Cj-i- We choose to update each Pkj 
parameter in turn, conditional on all other parameters remaining fixed. It is tricky to 
define a simple mechanism for truncated sampling of the regression parameters, due to 
the fact that the conditions (16) change according to whether we propose /3(,j < 

or P'kj > ■ Instead we opt here for a simple random-walk proposal, such that 


= ( 21 ) 

where Tp > 0 controls the size of the maximum move away from the current value at each 
iteration. For the cut-point parameters, 6j, we choose truncated uniform random-walk 
proposals, such that 


0' I 9^^^ = 


U {of - TO, min + TO, 0^1, + 


( 


U I max 




u{ 


max 


ey 

,(*) 


■T9,0f+i 


a 


(i) 


Te,aj-i 


3(b 


T9 


if j = 1, 

if j = 2,...,J-2, 
if j = J - 1, 


( 22 ) 


where rg > 0 controls the size of the maximum unconstrained move. 
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3.3 Partial proportional odds 

The partial proportional odds (PPO) model, proposed by Peterson and Harrell (1990), 
allows some variables to have a proportional odds structure and some to not. It takes 
the form 



(23) 


where i = 1,... ,n and j = 1,...,J — 1. The regression parameters (3 correspond to 
the set of explanatory variables, Jti, that have a proportional odds structure, and the 
regression parameters rjj correspond to the set of explanatory variables, t/i, that have a 
non-proportional odds structure. The approaches described in Sections 3.1 and 3.2 can 
be combined in order to fit a PPO model, where the PO and NPO variables {X and 
U) are known in advance. In subsequent sections we formulate an approach whereby 
the optimal choice of PO or NPO structure for each explanatory variable can instead 
be directly estimated by the model. 

3.4 Justification of approach 

The approach described in the previous section assumes that each of the variables 
is bounded in some finite region, which is true for any set of categorical explanatory 
variables, since for a categorical variable Zi with L levels (0,..., L — 1), it is straight¬ 
forward to represent Zi as a set of L — 1 dummy variables, Xu ,..., such that 



(24) 


In the case of continuous or discrete explanatory variables that are bounded in a finite 
range, then the approach described in Section 3.2 will also ensure stochastic ordering 
holds. However, if Xk is defined over an infinite range, then these conditions will break 
for some values of Xk if the Pkj parameters are also unbounded. 

Here we argue for a pragmatic solution to this problem, by considering that it is 
possible to define an upper and lower bound for Xk based on the observed data, and 
then use (18) to set boundary conditions for the conditional priors in (20). Although 
this does not mean that the stochastic ordering will hold for all possible theoretical 
values of Xk, it does ensure that the stochastic ordering will hold for the range of values 
found in the observed data. We make two main arguments to justify this approach: 

1. Although theoretically the values for Xk might be infinite, for any practical appli¬ 
cations of the model there will almost certainly be a finite range of possible values. 
If the observed data are a fair representation of the underlying population, then 
provided the model is a good fit, any population-level inference made from the 
model is likely to be fairly robust (i.e. the posterior distribution is the distribution 
of the parameters given the observed data, so this is explicitly represented within 
the Bayesian paradigm). 
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2. The assumptions underlying any statistical model can only be assessed across the 
range of values used to fit the data, there is no guarantee that the assumptions 
will hold beyond this range, even if it is possible to extrapolate without breaking 
any conditions of the model. 

4 Reversible-jump algorithm for variable selection in 
cumulative odds ordinal regression models 

Consider that we have K parameters describing the explanatory variables. In the first 
instance assume that each parameter measures the effect of a single variable (i.e. there 
are no categorical variables with > 2 levels, or any interaction effects). We will extend 
discussion to these more complex variables in due course. We can model the relationship 
between the response variable Y and each variable Xk in one of three ways: either with 
a PO structure, an NPO structure or no relationship at all; in this example giving 3^ 
possible models. Here we will assume that we have no prior information to distinguish 
between which of these models is most likely, and so assume equal prior probabilities 
of association for each competing model. To model these structures we introduce an 
indicator variable Sk, for k = 1,... ,K, where 



(25) 


To ease programming, it is helpful to treat each of these three possibilities as special 
cases of the NPO-structure, such that if a variable has a PO-structure then this is 
equivalent to setting (3kj = (dk for j = l,...,J—1, with independent point-mass priors 
on /?fe 2 , • ■ ■,/3fc(j-i) such that f {Pkj = 0) = 1. If a variable is excluded then this is 
equivalent to setting (dkj = 0 with a point-mass prior / {(3kj = 0) = 1. This enables us 
to use the conditions in (16) to ensure general stochastic ordering. 

4.1 Adding or removing variables 

Our stochastic search routine updates each variable Xk in a random order, by propos¬ 
ing to add the variable (if currently excluded) or to remove the variable (if currently 
included) with a probability pjump (hence we do nothing with a probability 1 — Pjump). 

To add a variable into the model, we sample whether to use a PO or NPO structure 
with probability ppo and pnpo = 1 ~ Ppo respectively. To add a variable to the model 
with a PO structure, we define a bijective function 


go^po (ui) =ui= (3k, 


(26) 


where ui is sampled from some distribution with p.d.f. g«(-)- To add a variable with an 
NPO structure, we define a bijective function 


go^NPO [ui,..., uj-i) = (ui,..., uj-i) = {Pki, ■ ■ ■, (3k{j-i)) , (27) 


where iti,... ,mj-i are independent and identically distributed (i.i.d.) samples from a 
distribution following g„(-). For a 0 —> PO move, the acceptance probability is 
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a = min 


' / I/9*'*\^*'*0 ^ (^i) Ppo 


(28) 


The probability of adding or dropping a variable, Pjump, is the same for the forwards and 
reverse moves, and so cancel in the acceptance ratio. The determinant of the Jacobian 
matrix is 1. For a 0 —s- NPO move, the acceptance probability is 


a = min 


/(y |/3',0«) 

nl.-l/(«,); 


/(y |/3«,e«) " 


(of 1 of i,/3f ,3f 1 ) 



1 


1 


qu (ui,... ,uj-i) 1-ppo 


(29) 


We let Uj be i.i.d. random variables such that Uj ^ N ^0,(Tp^j, where is the 
proposal variance. To remove a variable that is currently included we can simply reverse 
this process, amending the acceptance probabilities accordingly. 


4.2 Updating included variables 

The second stage of our MCMC routine involves updating the values for any parameters 
that are currently included in the model. In a random order, we select each of the K 
variables in turn, and with probability Pmove we propose new values for the associated 
parameter(s), and with probability 1 — Pmove we propose a shift from PO —>■ NPO (if 
variable k has a PO structure), or NPO —>■ PO (if variable k has an NPO structure). 

If variable k has a PO structure, then to update the value of Pk we simply propose 
a new value from some proposal distribution with p.d.f. qp . The update is then a 

standard Metropolis-Hastings step. Likewise for fikj (j = 1, ■■■,■/ — I) if variable k has 
an NPO structure. 

To switch structures we require a reversible-jump step. To make an NPO —>■ PO 
move—i.e. map (/?fci, •. ■, (ik{j-i)) Pk —we define a bijective function 

ffATPO-i-PO (/3fei, • ■ •,/?fe(j-i)) = /dfe. — 2/3fc2, •. •, ;dfe. — 2/3fe(j_i)) 

= il3k,ui,...,uj-2), (30) 

where Pk- = {J — 1)“^ Sj=i To make the reverse move we do not have to propose 
any new values, and simply use the inverse function 

gPO-yNPO iPk,Ul^ ■ • ■ , Uj- 2 ) 

J-2 

/3fc (4 - J) J- ^ Uj, {Pk - Ui),..., {Pk - uj- 2 ) 

1=1 

= {Pkl, ■ ■ ■ ,Pk{J-l)) ■ 



(31) 
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These choices are based around the moment matching approach of Brooks et al. (2003). 
The acceptance probability for a PO —>■ NPO move is: 




nON) 


— 

£_, > 

3 ( 0 ^ 0(0 

(ef 1 » 

) " /(/3fe 

) 

I 

nfd/ 

x(J-l)( 

'N 'S 

to 1 — 

) /a(0 ad) 

J-2' 

5 

'' Qu (Ui, . . .,Uj-2) 


(32) 


where the final term is the absolute value for the determinant of the Jacobian. Similarly, 
the acceptance probability for an NPO —>■ PO move is 


a = min 



X 


J- 1 



(33) 


We then proceed to update the cut-points, , in the same way as described in Sec¬ 
tion 3.2. 

We note that this general RJ-MCMC algorithm can be adapted in various ways 
simply by altering the move probabilities. For example, we can remove the variable 
selection steps and just allow the model to move betwen the PO and NPO structures 
for each variable by setting pjump = 0. Similarly, we can also fix all parameters to have 
either a PO or NPO structure, both with or without variable selection, by adjusting 
Pjump and Pmove accordingly. 


4.3 Tuning 

In some cases there may be some identifiability issues between regression parameters 
and their corresponding inclusion indicators when implementing variable selection rou¬ 
tines using the framework decribed above. For example, there may be almost identical 
likelihoods when a parameter is removed (set to zero) and when a parameter is present 
but has a value close to zero (e.g. O’Hara and Sillanpaa, 2009). If vague priors are used 
for the regression parameters and inclusion indicators, then these parameters may be 
unidentifiable. One way to control this is to use a more informative prior, such as one 
guided by the data or training runs of the model. However, as O’Hara and Sillanpaa 
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(2009) note, there is a danger that these approaches will contravene the philosophical 
construct that the prior distribution should represent one’s beliefs about the parameters 
before obtaining any data. 

A potential way to tackle this problem in this case is to introduce a hyperprior 
governing the variance component of the priors for the regression parameters, /3. This 
could be done in various ways, but for PO structures we set 

13k ^ N (0, alp) where akf} ~ U (0, f) (34) 

and ^ is the maximum a priori range for akp (O’Hara and Sillanpaa, 2009), and for 
NPO structures we set 


Pkj ~ N (0, aljp) where akjp ~ U (0, f). (35) 

This adds a further complexity to the model since it introduces additional parameters 
to sample during the dimension-jumping steps. For example, a 0 —>■ PO move would 
now consist of moving from 0 —(/3fc, akp), likewise a PO —NPO jump would consist 
of moving from (/?fc,crfc/3)->■ (/3fci, cr/ci/3, ■ • ■, l3k{j-i),<^k{j-i)p) and so on. To do this we 
update each /3 parameter and its corresponding ap parameter at the same time, using 
independent proposal distributions. 

A slight complexity is that the standard deviations must be positive. Hence for a 0 — 
PO or 0 —>■ NPO move (or the reverse moves), we use the same bijective functions as are 
described in Section 4.1, except that the dummy variables for the standard deviations 
are i.i.d. samples from a U (0,f) distribution. The acceptance probabilities are adjusted 
accordingly. For a PO NPO, we use a slightly different bijective function for proposing 
the standard deviations than for the regression parameters. Here we propose values for 
Ml,..., uj -2 as i.i.d. U (max [0, ak — To -], min [uk + To-, C]) variables, and then define 

gpo^NPO ... ,uj-2) = {ak,ak + ui,... ,ak + uj- 2 ) 

= ■ ,CTfc(j_i)) . (36) 

To make the reverse move we do not have to propose any new values, and simply use 
the inverse function 

ffATPO-i-PO , O-fc(J-l)) = 0-fc2 — CTfcl, • ■ • , — CTfei) 

= (CTfc,Mi, . . . ,UJ_2) . (37) 

The acceptance probabilities are updated accordingly, but the additional proposals of 
the standard deviation terms do not change the Jacobian terms in (32) or (33). 

4.4 Including categorical explanatory variables with > 2 levels, and 
interaction effects 

When comparing nested models including interaction effects, it is usual to specify that 
interactions can only be included as long as the corresponding main effect terms are 
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also included, and that higher-order interaction terms are included only if all lower-order 
terms are included (Krzanowski, 1998). These constraints can be incorporated into the 
routines described in Section 4 by altering the move probabilities. For example, consider 
the possible moves for a main effect variable, currently included in the model (with 
a PO structure). If there were no interaction effects, then we propose to exclude the 
variable with probability pjump- If we are modelling interaction effects, then we would 
instead propose to exclude the variable with probability Pjump; where 



(38) 


Likewise, to add interaction effects we need to check that all associated main effects and 
lower-order interaction effects are present first. This ensures that we only drop or add 
variables in the correct manner. 

Explanatory variables with > 2 categories require more than one dummy variable 
to model (see Section 3.4). In this case, when proposing to add or remove a variable of 
this form, we must ensure that all associated dummy variables are added or removed 
simultaneously. We propose to add a variable of this nature with probability pjump, 
and then for each associated dummy variable X/-, we independently propose whether 
these will have PO or NPO structures on addition, with probabilities ppo or pnpo 
respectively. The acceptance probabilities are amended accordingly, with the Jacobian 
term being just a product of the corresponding Jacobian terms for each of the dummy 
variables. The reverse process proceeds in a similar manner. 

5 Applications 

All the following routines were coded in C and R (R Core Team, 2012) and are available 
in an R package called BayesOrd, which in turn uses the coda (Plummer et ah, 2006) 
and multicore (Urbanek, 2011) packages to produce output and run multiple chains in 
parallel. All results are reported to 2 significant figures (s.f.). The development version 
of this package is available at https://github.com/tjmckinley/BayesOrd. Following Link 
and Eaton (2012), we do not thin our MCMC chains once the burn-in has been discarded. 

5.1 Simulation study 

To test the performance of our algorithms, we simulated different data sets assuming 

(a) each variable has a PO structure; 

(b) each variable has an NPO structure; and 

(c) a mixture of PO and NPO variables are used. 

For each scenario we simulated risim = 100 data sets, each containing n = 1000 samples. 
Each sample corresponds to measurements on the response variable (Y) and 7 explana¬ 
tory variables (5 binary, Xi ,..., A 5 , and 2 discrete, Xq and A 7 ). The response is an 
ordinal variable with three levels. 


16 


Bayesian Model Choice in Ordinal Regression Models 


Each simulation proceeds as follows: 

1. In scenario (a), set each explanatory variable, Xk, to have a PO structure. In 
scenario (b) set each structure to NPO, and in scenario (c) sample the structure 
for each Xk from a Bernoulli distribution with probability 0.5. 

2. For each categorical variable Xik (i = 1,..., n; fc = 1,..., 5), sample its value (0 
or 1) from a Bernoulli distribution with probability 0.5. (Ensure that there are at 
least 5% of samples in each group by resampling if required.) 

3. For each discrete Xik = 1,..., n; fc = 6, 7), sample data points as Xik = l^'ikl^ 
where X^. ^ N (OjCr^.). Here, (Jk = Wk\ and cr(. ^ A^(0,5^). 

4. Sample the regression parameters Pkj ^ iV(0, 5^), where j = 1,2 corresponds 
to the length of the response. For each k corresponding to a PO structure, set 
Pkj = 

5. Sample the first threshold parameter, 9i ~ A^(—1,0.1^), and then simulate the 
second threshold parameter, 62 conditional on 0i, the simulated data X and the 
regression parameters (3, ensuring that the stochastic ordering conditions (16) 
hold. To do this we can use (18) to define a lower bound for 62 , and then add some 
positive random noise (we chose the absolute value from a N (0,0.1^) distribution). 
(Note that in the case of scenario (a) we only need to simulate such that 9i < 02, 
since the stochastic ordering conditions always hold.) 

6 . Finally, sample values of the response variable, Yl, from a multinomial distribution 
with probability vector defined using (23). (Ensure that there are at least 5% of 
the samples in each category of the response, else re-simulate.) 

Once the data sets were simulated, we proceeded to fit PO and NPO models in 
both Bayesian and maximum likelihood (ML) frameworks. The Bayesian models were 
fitted using the routines developed in this manuscript and implemented in the BayesOrd 
package. The maximum likelihood PO models were fitted using the polr function in the 
MASS (Venables and Ripley, 2002) package in R, and the ML NPO models were fitted 
using binary logistic regressions, as described in e.g. Bender and Grouven (1998). We 
also fitted a Bayesian PPO model, using the reversible-jump routines described earlier 
to choose between the competing structures for each variable. For the MCMC routines, 
we used 200,000 updates, with the first 10,000 discarded as burn-in. 

To summarise the results we examine the distributions for the squared error be¬ 
tween the true value of the regression parameters and the ML estimate or posterior 
mean accordingly. Table 1 summarises these results. Focussing first on the results from 
scenario (a), we can see that as expected, the PO models perform well, with the ML and 
Bayesian estimates showing a similar degree-of-accuracy. The NPO and PPO models 
also perform well, suggesting that although they are overparameterised, given enough 
data they can produce robust inference on the parameters. 

For data sets simulated using scenario (b), the PO models now fit poorly, but the 
NPO and PPO models once again perform well. Similar patterns are observed for the 
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simulations based on scenario (c), and once again the Bayesian NPO and PPO models 
perform very well in comparison to the other approaches. There are occasional poor 
estimates of the parameters used in the simulations (seen by the high 97.5% credible 
intervals in Table 1). These could be caused either by a lack-of-fit (most likely when 
these values are very high), or more frequently when the data are a sample from the 
extremes of the expected sampling distribution. In any case, the Bayesian methods 
seem more robust to these outliers, particularly compared to the extreme ML NPO 
mismatches. We postulate that this is likely due to the fact that the Bayesian methods 
contain all information in the likelihood, as opposed to the ML NPO method which 
must treat groups independently. 

As a simple exploration of the utility of the Bayesian PPO model for discrimi¬ 
nating between PO and NPO structures, we apply a threshold such that any vari¬ 
able with PPApo > 0.5 is classified as having a PO structure. In this case we have 
7 X 100 possible predictions for each simulation scenario. In the case of scenario (a), 
only 3/700 = 0.43% are incorrectly specified as having an NPO structure. In the case 
of scenario (b), 229/700 = 33% are misclassified as having a PO structure. For sce¬ 
nario (c), 124/700 = 18% are misclassified, of which 95/700 = 14% are NPO variables 
misclassified as PO variables, and 29/700 = 4% are PO variables misclassified as NPO 
variables. This shows a good predictive power, bearing in mind that the Bayesian model 
choice framework intrinsically favours more parsimonious models, and as such the ma¬ 
jority of misclassihcations were NPO variables being reduced to PO variables, such as 
we might expect if the differences between the regression parameters for each level of 
the response are small. Of course these ‘misclassihcations’ may be directly due to quirks 
in the data as a result of random sampling, and to this end Table 1 suggests that the 
Bayesian NPO and PPO estimates are robust compared to other methods, even ac¬ 
counting for any misclassihcation in the actual structure used for the simulations. We 
reiterate that these model hts were performed blind, without a prerequisite descriptive 
analysis that might shed some light on our a priori expectations of variable structures. 
In practice we would take more care with our preliminary model exploration and our 
model diagnostics, but with this in mind we think the methods perform well. 


5.2 Longitudinal study of individual-level risk factors affecting body 
condition score in a dog population in Zenzele, South Africa 

These data form part of a wider study to examine the impact of immunological and 
demographic factors on canine rabies vaccination coverage. This study was conducted 
in four locations: Braamfischerville and Zenzele in Gauteng province, South Africa; and 
Antiga and Kelusa in Bali province, Indonesia. Full details of the study, and a compre¬ 
hensive analysis of all the data collected from each of the sites is provided in Morters 
et al. (2014). 

To illustrate the methodology, we focus attention on one particular data set from 
Zenzele, exploring individual-level risk factors associated with body condition score in 
a population of dogs. The data set consists of 2746 entries, for 738 dogs, with each dog 
examined between 1 and 17 times across the period 3rd March 2008-8th April 2011. 
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Scenario 

Algorithm 

2.5% 

Median 

97.5% 


ML (PO) 

0.00 

0.06 

3.5 


Bayesian (PO) 

0.00 

0.06 

1.8 

(a) PO 

ML (NPO) 

0.00 

0.10 

7.8 


Bayesian (NPO) 

0.00 

0.07 

3.1 


Bayesian (PPO) 

0.00 

0.06 

1.9 


ML (PO) 

0.02 

5.2 

85 


Bayesian (PO) 

0.02 

6.3 

86 

(b) NPO 

ML (NPO) 

0.00 

0.32 

1821 


Bayesian (NPO) 

0.00 

0.37 

34 


Bayesian (PPO) 

0.00 

0.48 

37 


ML (PO) 

0.00 

2.5 

61 


Bayesian (PO) 

0.00 

3.4 

66 

(c) PPO 

ML (NPO) 

0.00 

0.24 

321 


Bayesian (NPO) 

0.00 

0.18 

18 


Bayesian (PPO) 

0.00 

0.16 

20 


Table 1: Summaries of squared error between estimated and true values, for data sets 
generated using three different scenarios (defined in the main text). Within each panel, 
n-sim = 100 simulated data sets are generated, each of size n = 1000 samples. Each panel 
is further stratified by the type of model (PO, NPO, PPO) and the fitting mechanism 
(ML or Bayesian). 


Body condition score (BCS) was assessed using a nine-point scoring system (German 
and Holden, 2006), with each dog being scored by two assessors simultaneously. The 
system assigns a score of 1-9, with 1 being very underweight, 5 being normal, and 
9 being obese. To maintain a reasonable sample size in each group, we amalgamated 
the extreme scores, resulting in 5 BCS groups: 1-2, 3, 4, 5 and 6-9. Eight explana¬ 
tory variables were collected: gender (male/female), OPL (oestrus-pregnancy-lactation; 
coded as normal/lactating/pregnant), number of dogs in the sample unit (discrete be¬ 
tween 1-9), age (0-6 months, 7-12 months, 13-36 months and >36 months), sterilisa¬ 
tion (true/false), confinement (true/false), owner reported clinical signs in the previous 
7 days (none/minor/major-short duration/major-medium duration/major-long dura¬ 
tion) and clinical signs observed by enumerator during interview (none/minor/major). 
In the interests of comparison, we fitted two separate models, the first assuming that 
the maximum BCS between the two assessors was correct, and the second assuming the 
minimum was correct. Summaries of the data are provided in Table 2, and distributions 
by BCS are shown in Figure 1. 

To account for the repeated measurements, an individual dog-level term, 'i/u-, was 
introduced, with prior distribution 

(39) 

where Di denotes the specific dog corresponding to observation i {Di = 1,...,738), 
and has a vague gamma hyperprior with shape and rate parameters 0.01 and 0.01 
respectively (i.e. mean=l and variance=100). At each iteration 30% of the terms 
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Variable 

Level 

Count / summary 


1-2 

123 


3 

462 

BCS 

4 

927 


5 

858 


6-9 

376 

Gender 

Female 

1468 

Male 

1278 


Normal 

2483 

OPL 

Lactating 

160 


Pregnant 

103 



Min: 1 



Lower quartile: 1 

# dogs in SU 


Median: 1 

Mean: 1.8 



Upper quartile: 2 



Max: 9 


0 -6m 

294 

Age 

7-12m 

452 

13-36m 

587 


> 36m 

1413 


No 

2679 

Sterilisation 


Yes 

67 

Conhnement 

No 

1955 

Yes 

791 


None 

2300 

Owner reported 

Minor 

165 

clinical signs in 

Major/short 

65 

previous 7 days 

Major/med. 

135 


Major/long 

81 

Clinical signs 

None 

1983 

observed by 

Minor 

497 

enumerator 

Major 

266 


Table 2: Marginal summaries of the data (assuming maximum BCS). The final column 
contains counts unless otherwise stated. For comparison, the BCS counts when choosing 
the minimum BCS are 191, 652, 1082, 617 and 204 respectively. 


were updated in turn at random, using a uniform random walk proposal with the max¬ 
imum proposal jump given by r^. Likewise the variance cr^ was also updated in the 
same manner with the maximum proposal jump given by . 

To complete the Bayesian specification we set the prior variance for the cut-points, 
CTg = 1, and the maximum a priori value for the standard deviations of the regression 
parameter priors, ^ = 20 (following O’Hara and Sillanpaa, 2009). The proposal param- 
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eters were: rp = 1, rg = 1, app = 1, = 1, = 1 and To-^ = 1. Two chains were 

run, and after a short training run of 1,000 iterations, from which initial values for the 
main chains were generated, we ran 500,000 iterations with the first 50,000 discarded as 
burn-in. To produce the fitted plots (Figure 1) we took 2,000 samples from the posterior. 
Full trace and density plots are given in Supplementary Materials. 

The PPAs for each variable, averaged across the competing models are shown in 
Table 3. For clarity, values < 1 x 10“^ are rounded to zero. More precise results are 
shown in Supplementary Table SI. Using conventional rules-of-thumb for interpreting 
these values (see e.g. Viallefont et ah, 2001), if a variable has a PPA of inclusion of 
<0.5, then we consider that there is negligible evidence to support this variable being 
associated with the response. PPAs of inclusion of 0.5-0.75 are considered weak evidence, 
0.75-0.95 positive evidence, 0.95-0.99 strong evidence and >0.99 very strong evidence. 



Max. BCS 

Min. BCS 

Variable 

Level 

PO 

NPO 

Exc. 

PO 

NPO 

Exc. 

Gender 

F 







M 

0.52 

0 

0.48 

0.72 

0 

0.28 

OPL 

Norm. 







Lac. 

1 

0 

0 

1 

0 

0 


Preg. 

1 

0 

0 

1 

0 

0 

# dogs in SU 

0.025 

0 

0.98 

0.038 

0 

0.96 

Age 

0 -6m 

7-12m 

1 

0 

0 

1 

0 

0 

13-36m 

1 

0 

0 

1 

0 

0 


>36m 

0 

1 

0 

0.82 

0.18 

0 

Sterilised 

N 

0.5 

0 

0.5 

0.43 

0 

0.57 

Y 

Confined 

N 

0.98 

0.02 

0 

1 

0 

0 

Y 


None 







Owner reported 

Minor 

0 

0 

1 

0 

0 

1 

clinical signs in 

Major/short 

0 

0 

1 

0 

0 

1 

previous 7 days 

Major/med. 

0 

0 

1 

0 

0 

1 


Major/long 

0 

0 

1 

0 

0 

1 

Clinical signs 

None 







observed by 

Minor 

1 

0 

0 

1 

0 

0 

enumerator 

Major 

1 

0 

0 

1 

0 

0 


Table 3: Posterior probabilities of association for different variables, averaged across all 
models. 


In this case we can see that there is consistency in the variables identihed as being 
important from both analyses (i.e. using the maximum and minimum BCS scores as the 
response). In this case we identify gender as showing weak evidence of an association; 
and OPL, age, confinement and enumerator observed clinical signs as showing very 
strong evidence of an association. 
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Figure 1: Marginal posterior predictive distributions for the explanatory variables, 
against the observed data. Bars represent the data, the points are the marginal pre¬ 
dictive means and the error bars are the 95% prediction intervals. 
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For those variables with PPAs > 0.5, we can see that in almost all of these cases the 
PO structure is preferred, which is reflected in the model averaged log cumulative odds 
ratios shown in Table 4. For clarity, where the posterior means and SDs are the same 
across the levels (to 2 s.f.—i.e. the variable has an effective PO structure), we show only a 
single result. For all intents and purposes the only variable that shows any possible non- 
negligible support for an NPO structure is the >36 month age class. Posterior predictive 
distributions for the observed values can be obtained, and the marginal means and 95% 
prediction intervals for the categorical explanatory variables are shown in Figure 1. 

The overall patterns using both the minimum BCS and maximum BCS as response 
are the same, and so in the following discussion we will focus on the estimates obtained 
from using the maximum BCS only. When assessing the posteriors, it is possible to 
produce conditional inference based on a given model, or produce a posterior based on 
a weighted mixture of the posteriors from each of the models being averaged over (see 
e.g. Kass and Raftery, 1995; Viallefont et ah, 2001; O’Hara and Sillanpaa, 2009). In 
the case of variables that have a non-zero posterior probability of exclusion, the latter 
approach will shrink these estimates towards zero. 

With this in mind, males are on average 1.2 times more likely to have a lower BCS 
than females. However, lactating females are, on average, 3.0 times more likely to have 
a lower BCS than equivalent males and non-pregnant females. Pregnant females on the 
other hand, are 1.6 times more likely to be have a higher BCS. For this variable, there is 
mixed support for the inclusion of gender with a PO structure, and exclusion altogether. 
Therefore, in this case these posterior estimates will have been shrunk towards zero 
relative to the conditional posterior given inclusion. This effect will be minimal for the 
other variables discussed below which each have a high probability of inclusion. 

The effect of age is interesting; relative to the 0-6 month category, dogs aged between 
7-12 months are 1.1 times more likely to have a higher BCS; dogs aged between 13-36 
months are 1.2 times more likely to have a lower BCS, and as adults (>36 months) they 
are between 1-4.5 times more likely to have a higher BCS, depending on the category 
level (since the adult age class has very strong evidence of an NPO structure). A likely 
explanation is that this pattern reflects normal morphological variation—generally, as 
dogs become older their activity levels will decrease, resulting in a general increase in 
BCS. 

An interesting finding in this analysis is that other than the > 36 month age cat¬ 
egory, all other variables had very strong support for a PO structure (conditional on 
inclusion). One of the key motivations for the study that generated these data was to 
examine the hypothesis that these canine populations are regulated by environmen¬ 
tal resource constraints (as they would be in wild populations). If this hypothesis is 
true, then consistent with empirical evidence in other species and ecological theory, the 
thin dogs should generally be the ones with highest energy requirements (particularly 
lactating and growing dogs). The marginal distributions shown in Figure 1 provide qual¬ 
itative evidence against this hypothesis, since whilst on average there was a tendency 
for lactating dogs to be thinner than non-lactating dogs, overall the body condition 
distribution for lactating dogs shows that most dogs are in reasonable body condition, 
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Maximum BCS 

Minimum BCS 

Variable 

Level of 

Level of 

Mean 

SD 

Mean 

SD 

(baseline level) 

variable 

response 



1 





Gender (F) 

M 

2 

3 

-0.18 

0.20 

-0.28 

0.21 



4 







1 






Lac. 

2 

3 

-1.1 

0.18 

-1.3 

0.18 

OPL (normal) 


4 






1 

0.46 

0.25 




Preg. 

2 

0.46 

0.22 

0.47 

0.22 


3 

0.46 

0.22 



4 

0.46 

0.22 





1 






7-12m 

2 

3 

0.12 

0.15 

0.35 

0.16 



4 







1 





Age (0“6m) 

13-36m 

2 

3 

-0.19 

0.15 

0.09 

0.15 



4 







1 

-0.0065 

0.22 

0.64 

0.28 


>36m 

2 

0.27 

0.18 

0.69 

0.2 


3 

0.74 

0.17 

0.79 

0.2 



4 

1.5 

0.21 

0.86 

0.31 



1 

-0.56 

0.14 



Confined (N) 


2 

-0 55 

0 12 



Y 

3 

-0.55 

0.12 

-0.58 

0.12 



4 

-0.55 

0.13 





1 





Clinical signs 

Minor 

2 

-0.14 

0.11 

-0.14 

0.11 

3 

observed by 


4 





enumerator 


1 





(None) 

Major 

2 

3 

-1 

0.15 

-1.3 

0.15 



4 






Table 4: Model averaged posterior means and standard deviations for the log cumulative 
odds ratios. Only those variables with non-negligible assocation to response (i.e. a PPA 
of inclusion of >0.5) are shown. For clarity, those variables that have the same means 
and SDs for each level of the response (to 2 s.f.) are shown as a single entry. 
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with fewer dogs in the extremes. Crucially there are underweight lactating dogs and un¬ 
derweight non-lactating dogs, and there are overweight lactating dogs and overweight 
non-lactating dogs—consistent with variable food availability most likely from an owner, 
rather than from the environment (e.g. scavenging). The same is true for young dogs. 

A similar argument could be made by examining the evidence for PO versus NPO 
structures for these key variables (particularly OPL and age). Under the hypothesis of 
environmental constraints limiting population size, then we might expect lactating and 
young dogs to be more likely to exhibit an NPO structure, with decreasing negative log- 
odds ratios with increasing BCS. We do not observe this here. There is strong evidence 
of an NPO structure for the > 36 month age class, though this is again consistent with 
the population being ‘managed’, rather than acting as a wild population. 

Similar results are obtained for all four study regions. This information has impor¬ 
tant implications for designing optimal vaccination strategies against rabies in these 
populations. For full details of the study, and a comprehensive discussion about all the 
collected evidence, see Morters et al. (2014). 

Confinement is associated with a lower BCS, with confined dogs being 1.7 times more 
likely to have a lower BCS than unconfined dogs. Although confinement, as defined in 
this study, was highly variable (with regards to the length of time dogs were confined 
and the frequency that they were released), in general it was observed that dogs that 
were tied up were often neglected. See Morters et al. (2014) for a full discussion on these 
issues. 

Finally, the clinical signs variables cover a wide range of possible conditions. These 
were classified into ‘minor’ (considered unlikely to cause weight loss, such as localised 
skin lesions and lameness) and ‘major’ (considered likely to cause weight loss, such as 
vomiting and lethargy). This variable serves as an indicator of the general health of the 
dog, and it can be seen that as expected, dogs that show evidence of an ongoing medical 
condition (that is likely to cause weight loss), are more likely to have lower BCS values 
than their healthy counterparts: 1.2 times more likely for minor ailments and 2.7 times 
more likely for major ailments. 


6 Discussion 

We have introduced a method for fitting cumulative link ordinal regression models that 
does not require a priori assumptions regarding PO or NPO structures to model the 
relationship between the response and explanatory variables. For categorical explana¬ 
tory variables we show how stochastic ordering can be ensured in the case of NPO 
models, and provide a pragmatic approach to ensuring that stochastic ordering holds 
for continuous or discrete covariates within the range of the observed data. In addition 
these approaches can be extended to incorporate variable selection within a Bayesian 
framework, allowing posterior probabilities of association to be produced for competing 
models. It is straightforward to include individual-level terms to account for repeated 
measures, and Bayesian model averaging can to be used to provide weighted PPA esti¬ 
mates for the parameters that account for model uncertainty. We have illustrated the 
methods on a large-scale real-life data set. 
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The method uses reversible-jump MCMC to jump between models of differing dimen¬ 
sionality. However, implementational difficulties can exist with this method, particularly 
when jumping between models where the dimensionality is quite different. We found 
that the simple proposal mechanisms used throughout the paper worked well for this 
application and others we have tried. Nonetheless it is likely that specific situations may 
require more additional tuning (as with any MCMC method). For example, if some of 
the intervals specified by the stochastic ordering conditions (16) are small, and the pro¬ 
posal size, T /3 is too large, then for NPO structures this may result in a large proportion 
of proposed values for the Pkj parameters being rejected as a result of breaking the 
prior conditions on stochastic ordering. An alternative would be to sample from some 
form of truncated distribution, though due to the nature of the constraints, this is not 
trivial. 

Another interesting alternative would be to use some form of shrinkage model, where 
the model is defined as 



(40) 


where i = 1,... ,n and j = 1,...,J — 1. The conditional prior distributions for the 
r]kj parameters are centred around the corresponding (3^ with a small prior variance. 
The /3fc parameters can be given the same prior distribution as before. In this variation 
the model does not change dimensionality, and so no reversible-jump step is required. 
The r]kj parameters then correspond to the degree to which the parameter estimates 
deviate away from the proportional odds structure. This idea could also be expanded 
to incorporate variable selection in various ways (see e.g. O’Hara and Sillanpaa, 2009). 

Using single-component updates with simple random-walk proposals can also pro¬ 
duce Markov chains that are highly autocorrelated, and thus require a large number 
of iterations and a lot of thinning. Adaptive proposal mechanisms (Haario et ah, 2001; 
Roberts and Rosenthal, 2009) exist for standard (i.e. non-transdimensional) MCMC, 
that can automatically tune the proposal distributions to produce much more efficient 
chains in terms of both convergence and mixing. However, it is not currently under¬ 
stood whether these sorts of approaches hold for transdimensional routines, and this is 
a key area of ongoing research for those who are developing these methods (Hastie and 
Green, 2012). For the kinds of examples shown in this paper the runtimes required to 
produce a reasonable number of pseudo-independent samples are not prohibitive, and 
so we do not worry about this aspect here. It is not the purpose of this paper to provide 
a catch-all routine that works well in every situation, but rather to provide a flexible 
method that can be adapted to deal with different situations as required. 

We occasionally noticed some identifiability issues when fitting NPO models, pre¬ 
dominantly between categorical explanatory variables with low counts in some of the 
groups, and the cut-off parameters. This can be tackled in two main ways: firstly, the 
variables can be recategorised to ensure that there is a minimum number of individuals 
in each group. Secondly, we can start the MCMC routines using more informative initial 
values. Appealing to the Occam’s Razor principal, in this paper we decided to generate 
initial values by producing a short training run, using a PO model that includes all of 
the explanatory variables (but ignoring the repeated measures). We then ran the full 
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model using the parameter values from the final iteration of the training run as initial 
values. A similar approach would be to generate maximum likelihood estimates for the 
simple PO model and use these as initial values instead. 

An example of the utility of these routines is that stochastic ordering can be ensured 
for continuous/discrete covariates within the range of the observed data. It is theoreti¬ 
cally possible to ensure these conditions hold for any finite range of values, if an upper 
or lower bound was known from sources of information other than the observed data. 
In any case, if it is of interest to extrapolate beyond the range of the data, then it is 
possible to use the posterior samples to explore the range of covariate values over which 
the stochastic conditions will hold—essentially building a posterior distribution for the 
range of valid values. This could also be used as a form of sensitivity analysis to the 
model assumptions based on the model fit. 

It is also worth noting that although we have illustrated these methods using a 
logistic link function, the methods are applicable to any monotonically increasing link 
function (though of course the interpretation of the regression parameters will no longer 
be in terms of the cumulative odds). 


Supplementary Material 

Supplementary Materials: Bayesian model choice in cumulative link ordinal regression 
models: an application in a longitudinal study of risk factors affecting body condition 
score in a dog population in Zenzele, South Africa (DOI: 10.1214/14-BA884SUPP; .zip). 
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